AI and the Labour Market in Croatian Media

Media Framing Analysis (2021–2023)

Author

Luka Sikic

Published

February 9, 2026

Abstract

This document analyses how Croatian digital media frame the intersection of artificial intelligence and the labour market. The corpus is drawn from the Determ media-monitoring platform (January 2021 through December 2023) and includes only articles containing both AI-related and labour-market-related keywords (intersection logic). Eight interpretive frames, six actor categories, and automated sentiment are examined across time, platform, and media outlet types. The analysis provides a comprehensive exploratory and descriptive foundation for subsequent econometric work.

NoteData preparation

The corpus was extracted by R/01_extract_corpus.R and enriched by R/02_add_diagnostics.R. Run those scripts first if the data files are missing.

1 Introduction

1.1 Background

Technological change has historically reshaped labour demand in ways that were difficult to predict at the time. Artificial intelligence poses a distinctive challenge because it affects not only routine manual tasks but also non-routine cognitive work such as programming, translation, data analysis, and creative writing.

Media do not merely relay facts; they select, omit, and emphasise, thereby actively shaping public perception. The way media frame the impact of AI on work can influence individual behaviour, corporate strategy, and public policy.

1.2 Research questions

RQ1 (Volume and dynamics) How much media coverage does the AI–labour nexus receive, and how has coverage evolved over time?

RQ2 (Frames) Which interpretive frames dominate, and how has their prevalence shifted?

RQ3 (Actors) Whose perspectives appear most frequently?

RQ4 (Sources) Do different media outlet types frame the topic differently?

2 Data

2.1 Loading the corpus

Show code
if (!file.exists(path_raw_corpus)) {
  stop("Corpus not found at: ", path_raw_corpus,
       "\n\nRun R/01_extract_corpus.R first.")
}

corpus_data <- readRDS(path_raw_corpus)

# --- DATE FILTER: keep only articles before 2024-01-01 ---
corpus_data <- corpus_data |>
  filter(DATE < as.Date("2024-01-01"))

cat("Corpus loaded successfully\n")
Corpus loaded successfully
Show code
cat("Total articles:", format(nrow(corpus_data), big.mark = ","), "\n")
Total articles: 33,692 
Show code
cat("Date range:", as.character(min(corpus_data$DATE)), "to",
    as.character(max(corpus_data$DATE)), "\n")
Date range: 2021-01-01 to 2023-12-31 
Show code
cat("Columns:", ncol(corpus_data), "\n")
Columns: 14 
Show code
corpus_data$.text_lower <- stri_trans_tolower(
  paste(coalesce(corpus_data$TITLE, ""),
        coalesce(corpus_data$FULL_TEXT, ""),
        sep = " ")
)

if (!"year" %in% names(corpus_data)) {
  corpus_data$year <- year(corpus_data$DATE)
}
if (!"year_month" %in% names(corpus_data)) {
  corpus_data$year_month <- floor_date(corpus_data$DATE, "month")
}
if (!"quarter" %in% names(corpus_data)) {
  corpus_data$quarter <- quarter(corpus_data$DATE)
  corpus_data$year_quarter <- paste0(corpus_data$year, " Q", corpus_data$quarter)
}

corpus_data$word_count <- stri_count_regex(corpus_data$FULL_TEXT, "\\S+")

summary_stats <- tibble(
  Metric = c("Total articles", "Unique sources", "Date range",
             "Mean words/article", "Median words/article"),
  Value = c(
    format(nrow(corpus_data), big.mark = ","),
    format(n_distinct(corpus_data$FROM), big.mark = ","),
    paste(min(corpus_data$DATE), "to", max(corpus_data$DATE)),
    round(mean(corpus_data$word_count, na.rm = TRUE)),
    round(median(corpus_data$word_count, na.rm = TRUE))
  )
)

kable(summary_stats, col.names = c("Metric", "Value")) |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Metric Value
Total articles 33,692
Unique sources 2,034
Date range 2021-01-01 to 2023-12-31
Mean words/article 899
Median words/article 646

2.2 Source distribution

Show code
if ("SOURCE_TYPE" %in% names(corpus_data)) {
  source_dist <- corpus_data |>
    count(SOURCE_TYPE, sort = TRUE) |>
    mutate(pct = round(n / sum(n) * 100, 1))

  kable(source_dist, col.names = c("Source type", "N", "%")) |>
    kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
}
Source type N %
web 31927 94.8
facebook 727 2.2
youtube 433 1.3
forum 269 0.8
reddit 256 0.8
twitter 67 0.2
comment 13 0.0
Show code
top_sources <- corpus_data |>
  count(FROM, sort = TRUE) |>
  head(25)

ggplot(top_sources, aes(x = reorder(FROM, n), y = n)) +
  geom_col(fill = "#2c7bb6", alpha = 0.8) +
  coord_flip() +
  labs(title = "Top 25 sources", x = NULL, y = "Articles")
Figure 1: Top 25 sources by article count

2.3 Article length distribution

Show code
ggplot(corpus_data |> filter(word_count > 0, word_count < quantile(word_count, 0.99, na.rm = TRUE)),
       aes(x = word_count)) +
  geom_histogram(bins = 60, fill = "#2c7bb6", alpha = 0.7, color = "white") +
  geom_vline(xintercept = median(corpus_data$word_count, na.rm = TRUE),
             linetype = "dashed", color = "#d7191c") +
  annotate("label",
           x = median(corpus_data$word_count, na.rm = TRUE),
           y = Inf, vjust = 1.5,
           label = paste("Median:", round(median(corpus_data$word_count, na.rm = TRUE))),
           fill = "white", size = 3.5) +
  labs(title = "Article length distribution",
       subtitle = "Top 1% trimmed. Dashed line marks the median.",
       x = "Word count", y = "Articles")
Figure 2: Distribution of article length (word count)
Show code
if ("SOURCE_TYPE" %in% names(corpus_data)) {
  wc_by_source <- corpus_data |>
    filter(!is.na(SOURCE_TYPE)) |>
    group_by(SOURCE_TYPE) |>
    summarise(
      n       = n(),
      median  = round(median(word_count, na.rm = TRUE)),
      mean    = round(mean(word_count, na.rm = TRUE)),
      p25     = round(quantile(word_count, 0.25, na.rm = TRUE)),
      p75     = round(quantile(word_count, 0.75, na.rm = TRUE)),
      .groups = "drop"
    ) |>
    arrange(desc(median))

  kable(wc_by_source,
        col.names = c("Source type", "N", "Median", "Mean", "P25", "P75")) |>
    kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
}
Table 1: Article length by source type
Source type N Median Mean P25 P75
web 31927 674 935 439 1105
forum 269 317 425 171 571
youtube 433 232 268 138 359
reddit 256 163 256 87 318
comment 13 95 150 83 149
facebook 727 92 149 55 158
twitter 67 36 35 32 39

2.4 Platform landscape

Show code
corpus_data <- corpus_data |>
  mutate(
    platform = case_when(
      tolower(SOURCE_TYPE) %in% c("web", "internet") ~ "web",
      tolower(SOURCE_TYPE) %in% c("facebook", "fb")  ~ "Facebook",
      tolower(SOURCE_TYPE) %in% c("twitter", "x")    ~ "Twitter",
      tolower(SOURCE_TYPE) == "instagram"             ~ "Instagram",
      tolower(SOURCE_TYPE) == "youtube"               ~ "YouTube",
      tolower(SOURCE_TYPE) == "tiktok"                ~ "TikTok",
      tolower(SOURCE_TYPE) == "reddit"                ~ "Reddit",
      tolower(SOURCE_TYPE) %in% c("forum", "forum.hr") ~ "forum",
      TRUE ~ "Other"
    )
  )
Show code
platforms_with_data <- corpus_data |>
  count(platform) |>
  filter(n >= 20) |>
  pull(platform)

platform_monthly <- corpus_data |>
  filter(platform %in% platforms_with_data) |>
  count(year_month, platform) |>
  filter(!is.na(year_month))

ggplot(platform_monthly, aes(x = year_month, y = n)) +
  geom_col(aes(fill = platform), alpha = 0.7, show.legend = FALSE) +
  geom_smooth(method = "loess", se = FALSE, color = "black",
              linewidth = 0.7, span = 0.3) +
  facet_wrap(~ platform, ncol = 2, scales = "free_y") +
  scale_fill_manual(values = platform_colors) +
  scale_x_date(date_breaks = "6 months", date_labels = "%b\n%Y") +
  labs(title = "Monthly article volume by platform",
       x = NULL, y = "Articles") +
  theme(axis.text.x = element_text(size = 8))
Figure 3: Monthly volume by platform

3 Temporal dynamics

3.1 Monthly volume

Show code
monthly_volume <- corpus_data |>
  count(year_month) |>
  filter(!is.na(year_month))

events_cfg <- CONFIG$events
events <- tibble(
  date  = as.Date(sapply(events_cfg, `[[`, "date")),
  label = sapply(events_cfg, `[[`, "label"),
  y_pos = max(monthly_volume$n, na.rm = TRUE) * seq(0.95, by = -0.10,
                                                     length.out = length(events_cfg))
)

# Keep only events within data range
events <- events |> filter(date <= max(corpus_data$DATE))

ggplot(monthly_volume, aes(x = year_month, y = n)) +
  geom_col(fill = "#2c7bb6", alpha = 0.7) +
  geom_smooth(method = "loess", se = TRUE, color = "#d7191c", linewidth = 1) +
  geom_vline(data = events, aes(xintercept = date),
             linetype = "dashed", color = "gray40") +
  geom_label(data = events, aes(x = date, y = y_pos, label = label),
             size = 3, fill = "white", alpha = 0.9) +
  scale_x_date(date_breaks = "3 months", date_labels = "%b\n%Y") +
  labs(title = "AI and labour market coverage",
       subtitle = "Monthly volume with LOESS trend and key events",
       x = NULL, y = "Articles")
Figure 4: Monthly article volume with key events

3.2 Yearly volume

Show code
yearly_volume <- corpus_data |>
  count(year) |>
  filter(!is.na(year))

ggplot(yearly_volume, aes(x = factor(year), y = n)) +
  geom_col(fill = "#2c7bb6", alpha = 0.8) +
  geom_text(aes(label = format(n, big.mark = ",")), vjust = -0.5) +
  labs(title = "Annual volume", x = "Year", y = "Articles") +
  expand_limits(y = max(yearly_volume$n) * 1.1)
Figure 5: Annual article volume

3.3 Quarterly breakdown

Show code
quarterly_volume <- corpus_data |>
  count(year, quarter) |>
  filter(!is.na(year)) |>
  mutate(yq = paste0(year, " Q", quarter)) |>
  arrange(year, quarter) |>
  mutate(
    lag_n = lag(n, 4),
    yoy_growth = ifelse(!is.na(lag_n) & lag_n > 0,
                        round((n / lag_n - 1) * 100, 1), NA_real_)
  )

p_qvol <- ggplot(quarterly_volume, aes(x = fct_inorder(yq), y = n)) +
  geom_col(fill = "#2c7bb6", alpha = 0.8) +
  geom_text(aes(label = n), vjust = -0.5, size = 3) +
  labs(title = "Quarterly volume", x = NULL, y = "Articles") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  expand_limits(y = max(quarterly_volume$n, na.rm = TRUE) * 1.1)

p_qgrowth <- quarterly_volume |>
  filter(!is.na(yoy_growth)) |>
  ggplot(aes(x = fct_inorder(yq), y = yoy_growth,
             fill = yoy_growth > 0)) +
  geom_col(alpha = 0.8, show.legend = FALSE) +
  geom_hline(yintercept = 0, color = "gray40") +
  geom_text(aes(label = paste0(yoy_growth, "%")),
            vjust = ifelse(quarterly_volume$yoy_growth[!is.na(quarterly_volume$yoy_growth)] > 0, -0.5, 1.5),
            size = 3) +
  scale_fill_manual(values = c("TRUE" = "#4daf4a", "FALSE" = "#e41a1c")) +
  labs(title = "Year-over-year growth rate", x = NULL, y = "% change") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

p_qvol / p_qgrowth
Figure 6: Quarterly article volume with year-over-year growth

3.4 Daily volume heatmap

Show code
daily_volume <- corpus_data |>
  count(DATE) |>
  filter(!is.na(DATE)) |>
  mutate(
    wday = wday(DATE, label = TRUE, week_start = 1),
    week = floor_date(DATE, "week", week_start = 1)
  )

ggplot(daily_volume, aes(x = week, y = wday, fill = n)) +
  geom_tile(color = "white", linewidth = 0.3) +
  scale_fill_gradient(low = "#f7fbff", high = "#08306b",
                      name = "Articles") +
  scale_x_date(date_breaks = "3 months", date_labels = "%b\n%Y") +
  labs(title = "Daily article volume",
       subtitle = "Darker cells indicate higher activity",
       x = NULL, y = NULL) +
  theme(axis.text.y = element_text(size = 9))
Figure 7: Daily article volume (calendar heatmap)

3.5 Day-of-week patterns

Show code
weekday_avg <- corpus_data |>
  mutate(wday = wday(DATE, label = TRUE, week_start = 1)) |>
  count(DATE, wday) |>
  group_by(wday) |>
  summarise(avg = mean(n), .groups = "drop")

ggplot(weekday_avg, aes(x = wday, y = avg)) +
  geom_col(fill = "#2c7bb6", alpha = 0.8) +
  geom_text(aes(label = round(avg, 1)), vjust = -0.5, size = 3.5) +
  labs(title = "Publication pattern by day of week",
       x = NULL, y = "Average articles per day") +
  expand_limits(y = max(weekday_avg$avg) * 1.1)
Figure 8: Average daily articles by day of week

4 Frame analysis

4.1 Frame dictionaries

Show code
# Build dictionaries from config
frame_dictionaries <- lapply(CONFIG$frames, `[[`, "keywords")

frame_summary <- tibble(
  Frame       = names(frame_dictionaries),
  Description = sapply(CONFIG$frames, `[[`, "description"),
  Keywords    = sapply(frame_dictionaries, length)
)

kable(frame_summary, col.names = c("Frame", "Description", "N keywords")) |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Frame Description N keywords
JOB_LOSS AI eliminates jobs 22
JOB_CREATION AI creates new opportunities 13
TRANSFORMATION AI transforms rather than destroys work 16
SKILLS Focus on reskilling and education 14
REGULATION Policy, governance, and worker protection 13
PRODUCTIVITY Economic benefits and efficiency 11
INEQUALITY Distributional concerns and digital divide 11
FEAR_RESISTANCE Anxiety, fear, and opposition to AI 16

4.2 Frame detection

Show code
for (frame_name in names(frame_dictionaries)) {
  pattern <- paste(frame_dictionaries[[frame_name]], collapse = "|")
  corpus_data[[paste0("frame_", frame_name)]] <- stri_detect_regex(
    corpus_data$.text_lower, pattern
  )
}

frame_cols <- paste0("frame_", names(frame_dictionaries))
frame_counts <- corpus_data |>
  summarise(across(all_of(frame_cols), sum, na.rm = TRUE)) |>
  pivot_longer(everything(), names_to = "frame", values_to = "count") |>
  mutate(
    frame = str_remove(frame, "frame_"),
    pct   = round(count / nrow(corpus_data) * 100, 1)
  ) |>
  arrange(desc(count))

kable(frame_counts, col.names = c("Frame", "Articles", "% corpus")) |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Frame Articles % corpus
PRODUCTIVITY 7748 23.0
FEAR_RESISTANCE 5695 16.9
SKILLS 2372 7.0
TRANSFORMATION 1755 5.2
JOB_CREATION 1550 4.6
REGULATION 1519 4.5
INEQUALITY 664 2.0
JOB_LOSS 651 1.9
Show code
ggplot(frame_counts, aes(x = reorder(frame, count), y = count, fill = frame)) +
  geom_col(alpha = 0.8) +
  geom_text(aes(label = paste0(pct, "%")), hjust = -0.1, size = 3.5) +
  coord_flip() +
  scale_fill_manual(values = frame_colors, guide = "none") +
  labs(title = "Interpretive frame prevalence",
       subtitle = "Percentage of articles containing frame keywords",
       x = NULL, y = "Articles") +
  expand_limits(x = max(frame_counts$count) * 1.15)
Figure 9: Frame prevalence

4.3 Frame evolution

Show code
frame_monthly <- corpus_data |>
  group_by(year_month) |>
  summarise(
    n_total = n(),
    across(all_of(frame_cols), sum, na.rm = TRUE)
  ) |>
  pivot_longer(cols = all_of(frame_cols),
               names_to = "frame", values_to = "count") |>
  mutate(
    frame = str_remove(frame, "frame_"),
    pct   = count / n_total * 100
  ) |>
  filter(!is.na(year_month))

ggplot(frame_monthly, aes(x = year_month, y = pct, color = frame)) +
  geom_line(linewidth = 0.8, alpha = 0.8) +
  geom_smooth(method = "loess", se = FALSE, linewidth = 1.2, linetype = "dashed") +
  facet_wrap(~frame, ncol = 2, scales = "free_y") +
  scale_color_manual(values = frame_colors, guide = "none") +
  scale_x_date(date_breaks = "6 months", date_labels = "%b\n%Y") +
  labs(title = "Frame evolution",
       subtitle = "Monthly share of articles per frame with LOESS trend",
       x = NULL, y = "% articles") +
  theme(axis.text.x = element_text(size = 8))
Figure 10: Frame evolution over time

4.4 Frame co-occurrence

Show code
# Build frame matrix
frame_matrix <- as.matrix(corpus_data[, frame_cols])
colnames(frame_matrix) <- str_remove(colnames(frame_matrix), "frame_")

# Co-occurrence as proportion: of articles with frame A, what share also has frame B
n_articles <- nrow(frame_matrix)
n_frames <- ncol(frame_matrix)
cooc_matrix <- matrix(0, n_frames, n_frames,
                      dimnames = list(colnames(frame_matrix), colnames(frame_matrix)))

for (i in seq_len(n_frames)) {
  for (j in seq_len(n_frames)) {
    if (sum(frame_matrix[, i]) > 0) {
      cooc_matrix[i, j] <- sum(frame_matrix[, i] & frame_matrix[, j]) /
                            sum(frame_matrix[, i]) * 100
    }
  }
}

cooc_long <- as.data.frame(as.table(cooc_matrix)) |>
  setNames(c("Frame_A", "Frame_B", "pct"))

ggplot(cooc_long, aes(x = Frame_B, y = Frame_A, fill = pct)) +
  geom_tile(color = "white") +
  geom_text(aes(label = round(pct, 0)), size = 3) +
  scale_fill_gradient(low = "white", high = "#2c7bb6", name = "% co-occur") +
  labs(title = "Frame co-occurrence",
       subtitle = "Of articles containing row frame, what % also contain column frame",
       x = NULL, y = NULL) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
Figure 11: Frame co-occurrence heatmap

4.5 Frame density per article

Show code
corpus_data$n_frames <- rowSums(corpus_data[, frame_cols], na.rm = TRUE)

frame_density <- corpus_data |>
  count(n_frames) |>
  mutate(pct = round(n / sum(n) * 100, 1))

ggplot(frame_density, aes(x = factor(n_frames), y = n)) +
  geom_col(fill = "#2c7bb6", alpha = 0.8) +
  geom_text(aes(label = paste0(pct, "%")), vjust = -0.5, size = 3.5) +
  labs(title = "Frame density per article",
       subtitle = "How many distinct frames appear within a single article",
       x = "Number of frames detected", y = "Articles") +
  expand_limits(y = max(frame_density$n) * 1.1)
Figure 12: Number of frames detected per article

4.6 Composite threat vs opportunity index

Show code
corpus_data$threat <- as.integer(
  corpus_data$frame_JOB_LOSS | corpus_data$frame_FEAR_RESISTANCE |
  corpus_data$frame_INEQUALITY
)
corpus_data$opportunity <- as.integer(
  corpus_data$frame_JOB_CREATION | corpus_data$frame_PRODUCTIVITY |
  corpus_data$frame_TRANSFORMATION
)

composite_monthly <- corpus_data |>
  group_by(year_month) |>
  summarise(
    n = n(),
    threat_pct = sum(threat, na.rm = TRUE) / n() * 100,
    opportunity_pct = sum(opportunity, na.rm = TRUE) / n() * 100,
    .groups = "drop"
  ) |>
  filter(!is.na(year_month)) |>
  pivot_longer(cols = c(threat_pct, opportunity_pct),
               names_to = "index", values_to = "pct") |>
  mutate(index = ifelse(str_detect(index, "threat"), "Threat", "Opportunity"))

ggplot(composite_monthly, aes(x = year_month, y = pct, color = index)) +
  geom_line(linewidth = 0.6, alpha = 0.5) +
  geom_smooth(method = "loess", span = 0.3, se = TRUE, linewidth = 1.2) +
  scale_color_manual(values = c("Threat" = "#e41a1c", "Opportunity" = "#4daf4a")) +
  labs(title = "Threat vs opportunity narrative",
       subtitle = "Threat = job loss + fear + inequality; Opportunity = creation + productivity + transformation",
       x = NULL, y = "% of articles", color = NULL)
Figure 13: Composite threat vs opportunity frame indices over time

5 Actor analysis

Show code
actor_dictionaries <- CONFIG$actors

for (actor_name in names(actor_dictionaries)) {
  pattern <- paste(actor_dictionaries[[actor_name]], collapse = "|")
  corpus_data[[paste0("actor_", actor_name)]] <- stri_detect_regex(
    corpus_data$.text_lower, pattern
  )
}

actor_cols <- paste0("actor_", names(actor_dictionaries))
actor_counts <- corpus_data |>
  summarise(across(all_of(actor_cols), sum, na.rm = TRUE)) |>
  pivot_longer(everything(), names_to = "actor", values_to = "count") |>
  mutate(
    actor = str_remove(actor, "actor_"),
    pct   = round(count / nrow(corpus_data) * 100, 1)
  ) |>
  arrange(desc(count))

kable(actor_counts, col.names = c("Actor", "Articles", "% corpus")) |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Actor Articles % corpus
EMPLOYERS 17570 52.1
TECH_COMPANIES 14652 43.5
EXPERTS 12873 38.2
WORKERS 11739 34.8
POLICY_MAKERS 9544 28.3
UNIONS 695 2.1
Show code
ggplot(actor_counts, aes(x = reorder(actor, count), y = count)) +
  geom_col(fill = "#377eb8", alpha = 0.8) +
  geom_text(aes(label = paste0(pct, "%")), hjust = -0.1, size = 3.5) +
  coord_flip() +
  labs(title = "Actor prevalence", x = NULL, y = "Articles") +
  expand_limits(x = max(actor_counts$count) * 1.15)
Figure 14: Actor prevalence

5.1 Actor evolution over time

Show code
actor_monthly <- corpus_data |>
  group_by(year_month) |>
  summarise(
    n_total = n(),
    across(all_of(actor_cols), sum, na.rm = TRUE),
    .groups = "drop"
  ) |>
  pivot_longer(cols = all_of(actor_cols),
               names_to = "actor", values_to = "count") |>
  mutate(
    actor = str_remove(actor, "actor_"),
    pct   = count / n_total * 100
  ) |>
  filter(!is.na(year_month))

ggplot(actor_monthly, aes(x = year_month, y = pct, color = actor)) +
  geom_line(linewidth = 0.7, alpha = 0.7) +
  geom_smooth(method = "loess", se = FALSE, linewidth = 1, linetype = "dashed") +
  facet_wrap(~ actor, ncol = 2, scales = "free_y") +
  scale_x_date(date_breaks = "6 months", date_labels = "%b\n%Y") +
  labs(title = "Actor prevalence over time",
       subtitle = "Monthly share with LOESS trend",
       x = NULL, y = "% of articles", color = NULL) +
  theme(axis.text.x = element_text(size = 8), legend.position = "none")
Figure 15: Actor prevalence over time

5.2 Actor-frame associations

Show code
actor_frame_assoc <- expand.grid(
  actor = names(actor_dictionaries),
  frame = names(frame_dictionaries),
  stringsAsFactors = FALSE
)

actor_frame_assoc$pct <- mapply(function(a, f) {
  a_col <- paste0("actor_", a)
  f_col <- paste0("frame_", f)
  actor_articles <- corpus_data[[a_col]]
  n_actor <- sum(actor_articles, na.rm = TRUE)
  if (n_actor == 0) return(0)
  sum(actor_articles & corpus_data[[f_col]], na.rm = TRUE) / n_actor * 100
}, actor_frame_assoc$actor, actor_frame_assoc$frame)

ggplot(actor_frame_assoc, aes(x = frame, y = actor, fill = pct)) +
  geom_tile(color = "white") +
  geom_text(aes(label = round(pct, 1)), size = 3) +
  scale_fill_gradient(low = "white", high = "#2c7bb6", name = "% of actor articles") +
  labs(title = "Actor-frame associations",
       subtitle = "Of articles mentioning the actor, what % also contain the frame",
       x = NULL, y = NULL) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
Figure 16: Actor-frame association heatmap

6 Outlet classification

Show code
outlet_cfg <- CONFIG$outlet_types

corpus_data$outlet_type <- "Other"
for (type_name in names(outlet_cfg)) {
  for (pat in outlet_cfg[[type_name]]) {
    matches <- stri_detect_regex(stri_trans_tolower(corpus_data$FROM), pat)
    corpus_data$outlet_type[matches] <- type_name
  }
}

outlet_dist <- corpus_data |>
  count(outlet_type, sort = TRUE) |>
  mutate(pct = round(n / sum(n) * 100, 1))

kable(outlet_dist, col.names = c("Outlet type", "N", "%")) |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Outlet type N %
Other 22472 66.7
Quality 4702 14.0
Business 2436 7.2
Tabloid 1511 4.5
Tech 1235 3.7
Regional 1025 3.0
Public 311 0.9
Show code
min_articles <- CONFIG$analysis$min_articles_per_outlet

frames_by_outlet <- corpus_data |>
  group_by(outlet_type) |>
  summarise(
    n = n(),
    across(all_of(frame_cols), ~sum(.x, na.rm = TRUE) / n() * 100)
  ) |>
  filter(n >= min_articles) |>
  pivot_longer(cols = all_of(frame_cols),
               names_to = "frame", values_to = "pct") |>
  mutate(frame = str_remove(frame, "frame_"))

ggplot(frames_by_outlet, aes(x = frame, y = pct, fill = outlet_type)) +
  geom_col(position = "dodge", alpha = 0.8) +
  scale_fill_brewer(palette = "Set2") +
  labs(title = "Frame prevalence by outlet type",
       x = NULL, y = "% articles", fill = "Outlet type") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
Figure 17: Frame prevalence by outlet type

6.1 Outlet volume over time

Show code
outlet_monthly <- corpus_data |>
  filter(outlet_type != "Other") |>
  count(year_month, outlet_type) |>
  filter(!is.na(year_month))

ggplot(outlet_monthly, aes(x = year_month, y = n, fill = outlet_type)) +
  geom_area(alpha = 0.7, position = "stack") +
  scale_fill_brewer(palette = "Set2") +
  scale_x_date(date_breaks = "3 months", date_labels = "%b\n%Y") +
  labs(title = "Coverage volume by outlet type",
       x = NULL, y = "Articles", fill = "Outlet type")
Figure 18: Monthly volume by outlet type

6.2 Threat/opportunity by outlet type

Show code
outlet_composite <- corpus_data |>
  filter(outlet_type != "Other") |>
  group_by(outlet_type) |>
  summarise(
    n = n(),
    threat_pct = sum(threat, na.rm = TRUE) / n() * 100,
    opportunity_pct = sum(opportunity, na.rm = TRUE) / n() * 100,
    .groups = "drop"
  ) |>
  mutate(ratio = round(threat_pct / pmax(opportunity_pct, 0.1), 2)) |>
  pivot_longer(cols = c(threat_pct, opportunity_pct),
               names_to = "index", values_to = "pct") |>
  mutate(index = ifelse(str_detect(index, "threat"), "Threat", "Opportunity"))

ggplot(outlet_composite, aes(x = outlet_type, y = pct, fill = index)) +
  geom_col(position = "dodge", alpha = 0.85) +
  scale_fill_manual(values = c("Threat" = "#e41a1c", "Opportunity" = "#4daf4a")) +
  labs(title = "Threat vs opportunity by outlet type",
       x = NULL, y = "% of articles", fill = NULL) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
Figure 19: Threat vs opportunity framing by outlet type

7 Engagement analysis

Show code
if (all(c("REACH", "INTERACTIONS") %in% names(corpus_data))) {

  p_reach <- corpus_data |>
    filter(!is.na(REACH), REACH > 0,
           REACH < quantile(REACH, 0.99, na.rm = TRUE)) |>
    ggplot(aes(x = REACH)) +
    geom_histogram(bins = 50, fill = "#2c7bb6", alpha = 0.7, color = "white") +
    scale_x_log10(labels = scales::comma) +
    labs(title = "Article reach (log scale)", x = "Reach", y = "Articles")

  p_interactions <- corpus_data |>
    filter(!is.na(INTERACTIONS), INTERACTIONS > 0,
           INTERACTIONS < quantile(INTERACTIONS, 0.99, na.rm = TRUE)) |>
    ggplot(aes(x = INTERACTIONS)) +
    geom_histogram(bins = 50, fill = "#e41a1c", alpha = 0.7, color = "white") +
    scale_x_log10(labels = scales::comma) +
    labs(title = "Article interactions (log scale)", x = "Interactions", y = "Articles")

  p_reach | p_interactions
}
Figure 20: Distribution of article reach and interactions
Show code
if (all(c("REACH", "INTERACTIONS") %in% names(corpus_data))) {

  engagement_by_frame <- lapply(names(frame_dictionaries), function(fname) {
    f_col <- paste0("frame_", fname)
    articles_with <- corpus_data |> filter(.data[[f_col]] == TRUE)

    tibble(
      Frame = fname,
      N = nrow(articles_with),
      Median_reach = round(median(articles_with$REACH, na.rm = TRUE)),
      Median_interactions = round(median(articles_with$INTERACTIONS, na.rm = TRUE)),
      Mean_reach = round(mean(articles_with$REACH, na.rm = TRUE)),
      Mean_interactions = round(mean(articles_with$INTERACTIONS, na.rm = TRUE))
    )
  })

  engagement_tbl <- bind_rows(engagement_by_frame) |> arrange(desc(Median_reach))

  kable(engagement_tbl,
        col.names = c("Frame", "N", "Median Reach", "Median Interactions",
                       "Mean Reach", "Mean Interactions")) |>
    kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
}
Table 2: Median reach and interactions by dominant frame
Frame N Median Reach Median Interactions Mean Reach Mean Interactions
INEQUALITY 664 533 2 2202 59
FEAR_RESISTANCE 5695 480 2 2631 72
JOB_LOSS 651 450 2 1883 40
SKILLS 2372 450 2 3160 98
JOB_CREATION 1550 435 1 3031 87
REGULATION 1519 434 1 2622 57
TRANSFORMATION 1755 280 0 1559 41
PRODUCTIVITY 7748 261 1 1997 53
Show code
if (all(c("REACH", "INTERACTIONS") %in% names(corpus_data))) {

  engagement_platform <- corpus_data |>
    filter(platform %in% platforms_with_data) |>
    group_by(platform) |>
    summarise(
      n = n(),
      median_reach = round(median(REACH, na.rm = TRUE)),
      median_interactions = round(median(INTERACTIONS, na.rm = TRUE)),
      total_reach = round(sum(REACH, na.rm = TRUE)),
      total_interactions = round(sum(INTERACTIONS, na.rm = TRUE)),
      .groups = "drop"
    ) |>
    arrange(desc(median_reach))

  kable(engagement_platform,
        col.names = c("Platform", "N", "Median Reach", "Median Interactions",
                       "Total Reach", "Total Interactions")) |>
    kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
}
Table 3: Engagement metrics by platform
Platform N Median Reach Median Interactions Total Reach Total Interactions
Facebook 727 1243 8 6223726 43419
web 31927 360 1 68919863 1742228
YouTube 433 99 3 1115866 14129
Twitter 67 86 0 25662 320
Reddit 256 NA NA 0 0
forum 269 NA NA 0 0

7.1 Top articles by reach

Show code
if ("REACH" %in% names(corpus_data)) {
  top_reach <- corpus_data |>
    filter(!is.na(REACH)) |>
    arrange(desc(REACH)) |>
    head(20) |>
    dplyr::select(DATE, TITLE, FROM, REACH, INTERACTIONS) |>
    mutate(
      TITLE = substr(TITLE, 1, 80),
      REACH = format(REACH, big.mark = ","),
      INTERACTIONS = format(INTERACTIONS, big.mark = ",")
    )

  kable(top_reach, col.names = c("Date", "Title", "Source", "Reach", "Interactions")) |>
    kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                  full_width = FALSE, font_size = 11) |>
    scroll_box(height = "400px")
}
Table 4: Top 20 articles by estimated reach
Date Title Source Reach Interactions
2021-03-12 Uključi se u e-Savjetovanja! gov.hr 1,606,086 73,008
2023-09-07 DIGITAL HUMAN tema je jedne od najznačajnijih međunarodnih znanstvenih konferenc mnovine.hr 1,280,448 29,222
2023-10-30 Lovci i forenzičari, crveni i plavi timovi mnovine.hr 1,280,448 29,222
2023-11-21 Međimurske novine mnovine.hr 1,280,448 29,222
2022-07-26 Kakav algoritam koriste online casino igre u Hrvatskoj? dnevno.hr 1,133,925 37,005
2021-11-09 TRAŽITE NOVI POSAO I NOVE IZAZOVE? Možda ovi poslodavci trebaju baš vas, a neki dnevno.hr 1,132,440 36,944
2023-02-01 Predstavljamo novi bing.com 863,439 81
2023-02-07 Predstavljamo novi bing.com 863,439 81
2023-06-05 [PODIJELI] Oda lažnim novinarima, Sanji Modrić, Renati Rašović, Borisu Trupčević Građani za Mislava Kolakušića 818,766 13,373
2023-10-23 U posljednjih godinu dana termine poput "AI-a" i "ChatGPT-a" gotovo je nemoguće Mastercard 789,464 362
2022-10-03 SEO OPTIMIZACIJA tolo-design.com 780,147 29,439
2023-10-02 Umjetna inteligencija i budućnost automobilske industrije? Evo što trebate znati TotalEnergies 562,239 1,115
2022-09-03 Kupi moju kuću - Službena Netflixova stranica netflix.com 542,100 0
2022-04-14 ASUS Hrvatska - ExpertBook B1 (B1500) - Recenzija ASUS Hrvatska 364,263 7
2021-05-25 Menadžment Turističkog Smještaja direct-booker.com 353,265 12,681
2023-09-12 Croatia Records crorec.net 316,770 7,104
2021-09-29 ‘Plaće i penzije rastu, vraćaju nam se Hrvati iz inozemstva, doseljavaju stranci jutarnji.hr 266,454 14,076
2021-08-26 Vlasnik firme u Zagrebu uvodi radni tjedan od četiri dana, plaće ostaju iste index.hr 266,379 22,243
2023-06-24 ‎„GlowCast - osobni profesionalni brending kao jedina stabilna valuta u svijetu |apple.com |264,300 |0
2022-07-01 Uključi se u e-Savjetovanja! gov.hr 217,917 11,315

8 Sentiment analysis

Show code
if ("AUTO_SENTIMENT" %in% names(corpus_data)) {
  sentiment_dist <- corpus_data |>
    count(AUTO_SENTIMENT, sort = TRUE) |>
    mutate(pct = round(n / sum(n) * 100, 1))

  kable(sentiment_dist, col.names = c("Sentiment", "N", "%")) |>
    kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
} else {
  cat("AUTO_SENTIMENT column not available in corpus.\n")
}
Sentiment N %
positive 22644 67.2
negative 5973 17.7
neutral 5075 15.1
Show code
if ("AUTO_SENTIMENT" %in% names(corpus_data)) {
  sentiment_monthly <- corpus_data |>
    filter(!is.na(AUTO_SENTIMENT) & !is.na(year_month)) |>
    count(year_month, AUTO_SENTIMENT) |>
    group_by(year_month) |>
    mutate(pct = n / sum(n)) |>
    ungroup()

  ggplot(sentiment_monthly, aes(x = year_month, y = pct, fill = AUTO_SENTIMENT)) +
    geom_area(alpha = 0.7) +
    scale_fill_manual(values = sentiment_colors) +
    scale_y_continuous(labels = scales::percent) +
    scale_x_date(date_breaks = "3 months", date_labels = "%b\n%Y") +
    labs(title = "Sentiment over time",
         x = NULL, y = "Share", fill = "Sentiment")
}
Figure 21: Sentiment distribution over time

8.1 Sentiment by frame

Show code
if ("AUTO_SENTIMENT" %in% names(corpus_data)) {

  sentiment_frame <- lapply(names(frame_dictionaries), function(fname) {
    f_col <- paste0("frame_", fname)
    corpus_data |>
      filter(.data[[f_col]] == TRUE, !is.na(AUTO_SENTIMENT)) |>
      count(AUTO_SENTIMENT) |>
      mutate(
        frame = fname,
        pct = n / sum(n) * 100
      )
  })

  sentiment_frame_df <- bind_rows(sentiment_frame)

  ggplot(sentiment_frame_df,
         aes(x = frame, y = pct, fill = AUTO_SENTIMENT)) +
    geom_col(alpha = 0.85) +
    scale_fill_manual(values = sentiment_colors) +
    labs(title = "Sentiment composition by frame",
         subtitle = "Some frames carry systematically more negative sentiment",
         x = NULL, y = "% of articles", fill = "Sentiment") +
    theme(axis.text.x = element_text(angle = 45, hjust = 1))
}
Figure 22: Sentiment composition by frame

8.2 Sentiment by outlet type

Show code
if ("AUTO_SENTIMENT" %in% names(corpus_data)) {

  sentiment_outlet <- corpus_data |>
    filter(!is.na(AUTO_SENTIMENT), outlet_type != "Other") |>
    count(outlet_type, AUTO_SENTIMENT) |>
    group_by(outlet_type) |>
    mutate(pct = n / sum(n) * 100) |>
    ungroup()

  ggplot(sentiment_outlet, aes(x = outlet_type, y = pct, fill = AUTO_SENTIMENT)) +
    geom_col(alpha = 0.85) +
    scale_fill_manual(values = sentiment_colors) +
    labs(title = "Sentiment by outlet type",
         x = NULL, y = "% of articles", fill = "Sentiment") +
    theme(axis.text.x = element_text(angle = 45, hjust = 1))
}
Figure 23: Sentiment distribution by outlet type

9 Keyword analysis

9.1 Most frequent AI terms

Show code
ai_keywords <- list(
  "umjetna inteligencija" = "umjetn.*inteligencij",
  "strojno učenje"        = "strojn.*učenj",
  "ChatGPT"               = "chat.?gpt",
  "GPT-4/GPT-3"           = "gpt.?[34]",
  "OpenAI"                = "openai|open ai",
  "generativni AI"        = "generativn.*(ai|umjetn)",
  "automatizacija"        = "automatizacij",
  "robotizacija"          = "robotizacij",
  "algoritam"             = "algoritm",
  "neuronska mreža"       = "neuronsk|neuralna",
  "chatbot"               = "chatbot",
  "LLM"                   = "\\bllm\\b",
  "Gemini"                = "\\bgemini\\b",
  "Copilot"               = "copilot",
  "duboko učenje"         = "duboko.*učenj"
)

ai_freq <- sapply(ai_keywords, function(pat) {
  sum(stri_detect_regex(corpus_data$.text_lower, pat), na.rm = TRUE)
})

ai_freq_tbl <- tibble(
  Term = names(ai_freq),
  Articles = unname(ai_freq),
  Pct = round(ai_freq / nrow(corpus_data) * 100, 1)
) |> arrange(desc(Articles))

kable(ai_freq_tbl, col.names = c("AI Term", "Articles", "% corpus")) |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Table 5: Frequency of individual AI keyword matches
AI Term Articles % corpus
umjetna inteligencija 22637 67.2
automatizacija 7521 22.3
ChatGPT 3150 9.3
strojno učenje 3095 9.2
algoritam 2221 6.6
chatbot 2042 6.1
OpenAI 1661 4.9
generativni AI 1612 4.8
robotizacija 1010 3.0
neuronska mreža 893 2.7
GPT-4/GPT-3 508 1.5
duboko učenje 467 1.4
LLM 218 0.6
Copilot 183 0.5
Gemini 44 0.1

9.2 Most frequent labour terms

Show code
labour_keywords <- list(
  "posao/poslovi"           = "\\bposao|\\bposlovi|\\bposlove|\\bposlova",
  "zaposleni/zapošljavanje" = "zaposlen|zapošljav",
  "nezaposlenost"           = "nezaposlen",
  "radno mjesto"            = "radn.*mjest",
  "tržište rada"            = "tržišt.*rada",
  "vještine"                = "vještin",
  "kompetencije"            = "kompetencij",
  "karijera"                = "karijer",
  "produktivnost"           = "produktivnost",
  "otpuštanje"              = "otpuštan",
  "prekvalifikacija"        = "prekvalifikacij|dokvalifikacij",
  "plaća"                   = "\\bplaća|\\bplaće|\\bplaću",
  "poslodavac"              = "poslodav",
  "radna snaga"             = "radn.*snag",
  "zanimanje"               = "zaniman"
)

labour_freq <- sapply(labour_keywords, function(pat) {
  sum(stri_detect_regex(corpus_data$.text_lower, pat), na.rm = TRUE)
})

labour_freq_tbl <- tibble(
  Term = names(labour_freq),
  Articles = unname(labour_freq),
  Pct = round(labour_freq / nrow(corpus_data) * 100, 1)
) |> arrange(desc(Articles))

kable(labour_freq_tbl, col.names = c("Labour Term", "Articles", "% corpus")) |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Table 6: Frequency of individual labour market keyword matches
Labour Term Articles % corpus
posao/poslovi 21881 64.9
zaposleni/zapošljavanje 12188 36.2
radno mjesto 8080 24.0
vještine 7078 21.0
plaća 6050 18.0
tržište rada 4348 12.9
karijera 4218 12.5
radna snaga 3275 9.7
produktivnost 2949 8.8
poslodavac 2528 7.5
zanimanje 2217 6.6
kompetencije 1955 5.8
nezaposlenost 987 2.9
otpuštanje 580 1.7
prekvalifikacija 270 0.8

9.3 AI term evolution

Show code
key_ai_terms <- list(
  "ChatGPT"      = "chat.?gpt",
  "AI/UI"        = "umjetn.*inteligencij",
  "automatizacija" = "automatizacij",
  "algoritam"    = "algoritm",
  "robotizacija" = "robotizacij"
)

ai_term_monthly <- lapply(names(key_ai_terms), function(term_name) {
  corpus_data |>
    group_by(year_month) |>
    summarise(
      pct = sum(stri_detect_regex(.text_lower, key_ai_terms[[term_name]]),
                na.rm = TRUE) / n() * 100,
      .groups = "drop"
    ) |>
    filter(!is.na(year_month)) |>
    mutate(term = term_name)
})

ai_term_monthly_df <- bind_rows(ai_term_monthly)

ggplot(ai_term_monthly_df, aes(x = year_month, y = pct, color = term)) +
  geom_line(linewidth = 0.7) +
  labs(title = "AI term prevalence over time",
       subtitle = "Share of articles mentioning each term",
       x = NULL, y = "% of articles", color = "Term") +
  scale_x_date(date_breaks = "3 months", date_labels = "%b\n%Y")
Figure 24: Selected AI term mention rates over time

10 Frame dynamics by platform

Show code
platform_frames <- corpus_data |>
  filter(platform %in% platforms_with_data) |>
  group_by(platform) |>
  summarise(
    n = n(),
    across(all_of(frame_cols), ~ sum(.x, na.rm = TRUE) / n() * 100),
    .groups = "drop"
  ) |>
  pivot_longer(cols = all_of(frame_cols),
               names_to = "frame", values_to = "pct") |>
  mutate(frame = str_remove(frame, "frame_"))

ggplot(platform_frames, aes(x = frame, y = pct, fill = platform)) +
  geom_col(position = "dodge", alpha = 0.85) +
  scale_fill_manual(values = platform_colors) +
  labs(title = "Frame prevalence by platform",
       x = NULL, y = "% of articles", fill = "Platform") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
Figure 25: Frame prevalence by platform
Show code
ggplot(platform_frames, aes(x = frame, y = platform, fill = pct)) +
  geom_tile(color = "white") +
  geom_text(aes(label = round(pct, 1)), size = 3) +
  scale_fill_gradient2(low = "white", mid = "#abd9e9", high = "#2c7bb6",
                       midpoint = median(platform_frames$pct)) +
  labs(title = "Platform x frame heatmap",
       x = NULL, y = NULL, fill = "% of articles") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
Figure 26: Platform x frame heatmap

11 Sample articles

Show code
set.seed(CONFIG$analysis$seed)
sample_articles <- corpus_data |>
  slice_sample(n = min(30, nrow(corpus_data))) |>
  dplyr::select(DATE, TITLE, FROM) |>
  arrange(DATE)

kable(sample_articles, col.names = c("Date", "Title", "Source")) |>
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE, font_size = 11) |>
  scroll_box(height = "400px")
Date Title Source
2021-01-03 Produktivnost bez BS-a split-techcity.com
2021-01-07 Serengeti želi rast u zapadnoj Europi, a 2020. završavaju na sličnim razinama prihoda i dobiti kao 2019. ictbusiness.info
2021-01-10 Stručnjaci predviđaju da bi u idućih 20 godina ovih 12 poslova moglo nestati. Je li i vaš među njima? sbonline.net
2021-08-20 Musk predstavio humanoidnog robota: ‘Rad će u budućnosti biti stvar izbora pojedinca‘ jutarnji.hr
2021-09-23 Huawei predstavio nadolazeće trendove u idućem desetljeću seebiz.eu
2021-11-14 Aktualni trendovi u digitalizaciji Hrvatske, Unije - DESI grabancijas.com
2021-11-24 Novi poslovi u Osijeku i okolici: Traže Vas Thermia kamini, Ancona grupa, Mehanotehna... osijekexpress.com
2022-01-11 'Sve na 15 minuta', inteligentna mobilnost... Što Zagreb može prigrliti već za 8 godina? - Poslovni dnevnik poslovni.hr
2022-01-24 Tko je nasljednica Dubravke Vrgoč? Opernu primadonu već slave u HNK, a u velikom intervju otkrila je ambiciozne planove zagreb.info
2022-01-27 AC Group: Vlastito iskustvo i znanje pretočili u razvoj uspješne obiteljske firme varazdinske-vijesti.hr
2022-02-22 ManpowerGroup SEE uvodi četverodnevni radni tjedan jatrgovac.com
2022-03-21 poslovniFM poslovnifm.com
2022-04-27 Samsung Galaxy S22 serija pametnih telefona i dobar AI su dokaz besprijekorne kvalitete hcl.hr
2022-06-09 RITTAL i KONČAR - Digital partneri u izgradnji podatkovnih centara i digitalizaciji regije - Poslovni dnevnik poslovni.hr
2022-08-08 Kurkumin - polifenol koji ne prestaje biti zagonetka inpharma.hr
2022-09-16 Matej Dujmović: Tvrtke moraju početi razmišljati o automatizaciji skladišnih i proizvodnih procesa lidermedia.hr
2022-11-24 Preko 70 stručnih predavača 1.12. u Zagrebu na konferenciji koja mijenja sve! 24sata.hr
2023-01-06 Terra Meera - čudesno mjesto mira i regeneracije u dalmatinskom zaleđu grazia.hr
2023-04-06 Bowery Farming - integrirane farme gospodarski.hr
2023-04-29 Ključno je postići suživot ljudi i strojeva, odnosno umjetne inteligencije itnovosti.com
2023-05-27 Sektori industrije i infrastrukture s najboljim izgledima seebiz.eu
2023-06-08 Snowden ponovno upozorava svijet! Njegovo otkriće je 'dječja igra' u odnosu na to što danas rade ljudima: Nadzor otišao predaleko dnevno.hr
2023-06-30 sudjelovalo na predstavljanju Europskog digitalnog centra inovacija CROBOHUB++ - Sveučilišni računski centar (Srce) unizg.hr
2023-07-17 Vijeće sigurnosti UN-a održat će prvu sjednicu o umjetnoj inteligenciji novilist.hr
2023-07-25 Appleova AI mogla bi značajno poboljšati iPhone, evo i kako dnevnik.hr
2023-07-26 Tomislav Miletić - Trgovci Novim I Rabljim Automobilima Doživjet Će Darqinovu Evoluciju lidermedia.hr
2023-08-30 PC Chip pcchip.hr
2023-09-28 OpenAI bi mogao vrijediti gotovo 100 milijardi dolara tportal.hr
2023-10-20 M. Kepec: Kako postati uspješan DevOps inženjer glas-slavonije.hr
2023-11-20 Prošli tjedan smo kroz naš PPK (PAR Poduzetnički Kamp) uronili u svijet poduzetništva! 🚀 Započeli smo s predavanjem o temi, formirali timove i započeli... | eleučilište PAR |

12 Summary

Show code
n_articles    <- nrow(corpus_data)
date_range    <- paste(min(corpus_data$DATE), "to", max(corpus_data$DATE))
n_sources     <- n_distinct(corpus_data$FROM)
top_frame     <- frame_counts$frame[1]
top_frame_pct <- frame_counts$pct[1]

# Frame summary
cat("ANALYSIS SUMMARY\n")
ANALYSIS SUMMARY
Show code
cat("================\n\n")
================
Show code
cat("Total articles:", format(n_articles, big.mark = ","), "\n")
Total articles: 33,692 
Show code
cat("Period:", date_range, "\n")
Period: 2021-01-01 to 2023-12-31 
Show code
cat("Sources:", format(n_sources, big.mark = ","), "\n")
Sources: 2,034 
Show code
cat("Dominant frame:", top_frame, "(", top_frame_pct, "% of articles)\n\n")
Dominant frame: PRODUCTIVITY ( 23 % of articles)
Show code
# Frame density
cat("Frame density:\n")
Frame density:
Show code
cat("  Articles with 0 frames:", sum(corpus_data$n_frames == 0), "\n")
  Articles with 0 frames: 17781 
Show code
cat("  Articles with 1 frame:", sum(corpus_data$n_frames == 1), "\n")
  Articles with 1 frame: 11215 
Show code
cat("  Articles with 2+ frames:", sum(corpus_data$n_frames >= 2), "\n\n")
  Articles with 2+ frames: 4696 
Show code
# Threat vs opportunity
cat("Composite indices:\n")
Composite indices:
Show code
cat("  Articles with threat framing:", sum(corpus_data$threat, na.rm = TRUE),
    "(", round(mean(corpus_data$threat, na.rm = TRUE) * 100, 1), "%)\n")
  Articles with threat framing: 6433 ( 19.1 %)
Show code
cat("  Articles with opportunity framing:", sum(corpus_data$opportunity, na.rm = TRUE),
    "(", round(mean(corpus_data$opportunity, na.rm = TRUE) * 100, 1), "%)\n")
  Articles with opportunity framing: 9685 ( 28.7 %)
Show code
summary_comprehensive <- tibble(
  Category = c(
    rep("Corpus", 5),
    rep("Frames", 4),
    rep("Actors", 2),
    rep("Platforms", 2)
  ),
  Metric = c(
    "Total articles", "Unique sources", "Date range",
    "Median article length (words)", "Mean article length (words)",
    "Dominant frame", "Articles with any frame",
    "Mean frames per article", "Threat/opportunity ratio",
    "Most mentioned actor", "Least mentioned actor",
    "Dominant platform", "Number of platforms"
  ),
  Value = c(
    format(nrow(corpus_data), big.mark = ","),
    format(n_distinct(corpus_data$FROM), big.mark = ","),
    paste(min(corpus_data$DATE), "to", max(corpus_data$DATE)),
    round(median(corpus_data$word_count, na.rm = TRUE)),
    round(mean(corpus_data$word_count, na.rm = TRUE)),
    paste0(frame_counts$frame[1], " (", frame_counts$pct[1], "%)"),
    paste0(sum(corpus_data$n_frames > 0), " (",
           round(mean(corpus_data$n_frames > 0) * 100, 1), "%)"),
    round(mean(corpus_data$n_frames), 2),
    round(sum(corpus_data$threat, na.rm = TRUE) /
          max(sum(corpus_data$opportunity, na.rm = TRUE), 1), 2),
    paste0(actor_counts$actor[1], " (", actor_counts$pct[1], "%)"),
    paste0(actor_counts$actor[nrow(actor_counts)], " (",
           actor_counts$pct[nrow(actor_counts)], "%)"),
    paste0(source_dist$SOURCE_TYPE[1], " (", source_dist$pct[1], "%)"),
    length(platforms_with_data)
  )
)

kable(summary_comprehensive, col.names = c("Category", "Metric", "Value")) |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) |>
  pack_rows(index = table(summary_comprehensive$Category)[
    unique(summary_comprehensive$Category)
  ])
Table 7: Comprehensive corpus statistics
Category Metric Value
Corpus
Corpus Total articles 33,692
Corpus Unique sources 2,034
Corpus Date range 2021-01-01 to 2023-12-31
Corpus Median article length (words) 646
Corpus Mean article length (words) 899
Frames
Frames Dominant frame PRODUCTIVITY (23%)
Frames Articles with any frame 15911 (47.2%)
Frames Mean frames per article 0.65
Frames Threat/opportunity ratio 0.66
Actors
Actors Most mentioned actor EMPLOYERS (52.1%)
Actors Least mentioned actor UNIONS (2.1%)
Platforms
Platforms Dominant platform web (94.8%)
Platforms Number of platforms 6

13 Data export

Show code
export_data <- corpus_data |> dplyr::select(-`.text_lower`)
saveRDS(export_data, path_analysed_corpus)

14 Session info

Show code
sessionInfo()
R version 4.5.2 (2025-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 22631)

Matrix products: default
  LAPACK version 3.12.1

locale:
[1] LC_COLLATE=Croatian_Croatia.utf8  LC_CTYPE=Croatian_Croatia.utf8   
[3] LC_MONETARY=Croatian_Croatia.utf8 LC_NUMERIC=C                     
[5] LC_TIME=Croatian_Croatia.utf8    

time zone: Europe/Zagreb
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] tidytext_0.4.3            quanteda.textstats_0.97.2
 [3] quanteda_4.3.1            kableExtra_1.4.0         
 [5] knitr_1.50                ggrepel_0.9.6            
 [7] patchwork_1.3.2           scales_1.4.0             
 [9] ggplot2_4.0.1             tibble_3.3.0             
[11] forcats_1.0.1             lubridate_1.9.4          
[13] stringi_1.8.7             stringr_1.6.0            
[15] tidyr_1.3.1               dplyr_1.1.4              
[17] yaml_2.3.11              

loaded via a namespace (and not attached):
 [1] janeaustenr_1.0.0  generics_0.1.4     xml2_1.5.1         lattice_0.22-7    
 [5] digest_0.6.39      magrittr_2.0.4     evaluate_1.0.5     grid_4.5.2        
 [9] timechange_0.3.0   RColorBrewer_1.1-3 fastmap_1.2.0      jsonlite_2.0.0    
[13] Matrix_1.7-4       mgcv_1.9-3         stopwords_2.3      purrr_1.2.0       
[17] viridisLite_0.4.2  textshaping_1.0.4  cli_3.6.5          rlang_1.1.6       
[21] tokenizers_0.3.0   splines_4.5.2      withr_3.0.2        tools_4.5.2       
[25] fastmatch_1.1-6    vctrs_0.6.5        R6_2.6.1           lifecycle_1.0.4   
[29] htmlwidgets_1.6.4  pkgconfig_2.0.3    pillar_1.11.1      gtable_0.3.6      
[33] glue_1.8.0         Rcpp_1.1.0         systemfonts_1.3.1  xfun_0.54         
[37] tidyselect_1.2.1   rstudioapi_0.17.1  dichromat_2.0-0.1  farver_2.1.2      
[41] nlme_3.1-168       SnowballC_0.7.1    htmltools_0.5.8.1  labeling_0.4.3    
[45] rmarkdown_2.30     svglite_2.2.2      compiler_4.5.2     nsyllable_1.0.1   
[49] S7_0.2.1